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Abstract 

We prove lower bounds for the direct sum problem for two-party bounded error randomised multiple- 
round communication protocols. Our proofs use the notion of information cost of a protocol, as defined 
by Chakrabarti et al. IC SWYQIJ and refined further by Bar-Yossef et al. |,BJKS02I. Our main technical 
result is a 'compression' theorem saying that, for any probability distribution /it over the inputs, a fc-round 
private coin bounded error protocol for a function / with information cost c can be converted into a k- 
round deterministic protocol for / with bounded distributional error and communication cost O(fcc). We 
prove this result using a substate theorem about relative entropy and a rejection sampling argument. Our 
direct sum result follows from this 'compression' result via elementary information theoretic arguments. 

We also consider the direct sum problem in quantum communication. Using a probabilistic argument, 
we show that messages cannot be compressed in this manner even if they carry small information. Hence, 
new techniques may be necessary to tackle the direct sum problem in quantum communication. 

1 Introduction 

We consider the two-party communication complexity of computing a function f : X y ^ Z. There 
are two players Alice and Bob. Alice is given an input x ^ X and Bob is given an input y £ y. They 
then exchange messages in order to determine f{x, y). The goal is to devise a protocol that minimises the 
amount of communication. In the randomised communication complexity model, Alice and Bob are allowed 
to toss coins and base their actions on the outcome of these coin tosses, and are required to determine the 
correct value with high probability for every input. There are two models for randomised protocols: in the 
private coin model the coin tosses are private to each player; in the public coin model the two players share a 
string that is generated randomly (independently of the input). A protocol where k messages are exchanged 
between the two players is called a /c -round protocol. One also considers protocols where the two parties 
send a message each to a referee who determines the answer: this is the simultaneous message model. 

The starting point of our work is a recent result of Chakrabarti, Shi, Wirth and Yao ICSWYOll con- 
cerning the direct sum problem in communication complexity. For a function f : X x y ^ Z, the 

m-fold direct sum is the function /"^ : X^ x y^ Z^, defined by f"^{{xi, . . . , Xm), {ui, ■ ■ ■ , Vm)) = 
(/(xi, yi), . . . , f(xm, Um))- One then studies the communication complexity of /"^ as the parameter m in- 
creases. Chakrabarti et al. P CSWYOU considered the direct sum problem in the bounded error simultaneous 
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message private coin model and showed that for the equality function EQ„ : {0, 1}" x {0, 1}" —>■ {0, 1}, 
the communication complexity of EQ™ is Q{m) times the communication complexity of EQ„. In fact, their 
result is more general. Let be the bounded error simultaneous message private coin communication 

complexity of / : {0, 1}" x {0, 1}" ^ {0, 1}, and let ^"™(/) = mins i?'™(/|5xs), where S ranges over 
all subsets of {0, 1}" of size at least (|)2". 

Theorem (fCSWYOll) = Q(m(^"™(/) - O(logn))). A similar result holds for two-party 

bounded enor one-round protocols too. 

The proof of this result in ICSWYOll had two parts. The first part used the notion of information cost of 
randomised protocols, which is the mutual information between the inputs (which were chosen with uniform 
distribution in liCSWYO l 1) and the transcript of the communication between the two parties. Clearly, the 
information cost is bounded by the length of the transcript. So, showing lower bounds on the information 
cost gives a lower bound on the communication complexity. Chakrabarti et al. showed that the information 
cost is super-additive, that is, the information cost of /™ is at least m times the information cost of /. The 
second part of their argument showed an interesting message compression result for communication proto- 
cols. This result can be stated informally as follows: if the message contains at most a bits of information 
about a player's input, then one can modify the (one-round or simultaneous message) protocol so that the 
length of the message is 0(a + log n). Thus, one obtains a lower bound on the information cost of / if one 
has a suitable lower bound on the communication complexity /. By combining this with the first part, we 
see that the communication complexity of /"^ is at least m times this lower bound on the communication 
complexity of /. 

In this paper, we examine if this approach can be employed for protocols with more than one-round 
of communication. Let R'^{f) denote the /c -round private coin communication complexity of / where the 
protocol is allowed to err with probability at most 5 on any input. Let /i be a probability distribution on 
the inputs of /. Let C^^{f) denote the deterministic fc-round communication complexity of /, where the 
protocol errs for at most 5 fraction, according to the distribution fi, of the inputs. Let C^-^ ^(/) denote the 
maximum, over all product distributions /i, of C^§{f)- We prove the following. 

Theorem: Let m,k be positive integers, and €,5 > 0. Let f : X x y ^ Z be a. function. Then, 

The proof this result, like the proof in fCSWYOlT, has two parts, where the first part uses a notion of infor- 
mation cost for /c -round protocols, and the second shows how messages can be compressed in protocols with 
low information cost. We now informally describe the ideas behind these results. To keep our presentation 
simple, we will assume that Alice's and Bob's inputs are chosen uniformly at random from their input sets. 

The first part of our argument uses the extension of the notion of information cost to A; -round protocols. 
The information cost of a A; -round randomised protocol is the mutual information between the inputs and 
the transcript. This natural extension, and its refinement to conditional information cost by | B JKS02 1 has 
proved fruitful in several other contexts |BJKS02, JRS03|. It is easy to see that it is bounded above by the 
length of the transcript, and a lower bound on the information cost of protocols gives a lower bound on the 
randomised communication complexity. The first part of the argument in IC SWYOIJ is still applicable: the 
information cost is super-additive; in particular, the fc-round information cost of f"^ is at least m times the 
A; -round information cost of /. 
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The main contribution of this work is in the second part of the argument. This part of Chakrabarti et 
al. ICSWYOll used a technical argument to compress messages by exploiting the fact that they carry low 
information. Our proof is based on the connection between mutual information of random variables and 
the relative entropy of probability distributions (see Section |2lfor definition). Intuitively, it is reasonable to 
expect that if the message sent by Alice contains little information about her input X, then for various values 
X of X, the conditional distribution on the message, denoted by Px, are similar. In fact, if we use relative 
entropy to compare distributions, then one can show that the mutual information is the average taken over 
X of the relative entropy S{Px\\Q) of Px and Q, where Q = Ex[Px]- Thus, if the information between 
Alice's input and her message is bounded by a, then typically S{Px\\Q) is about a. To exploit this fact, 
we use the Substate theorem of fJRS021 which states (roughly) that if S{Px\\Q) < a, then Px < 2""Q. 
Using a standard rejection sampling idea we then show that Alice can restrict herself to a set of just 2^'^°'^n 
messages; consequently, her messages can be encoded in 0{a + logn) bits. In fact, such a compact set of 
messages can be obtained by sampling 2^'^"'^n times from distribution Q. 

We believe this connection between relative entropy and sampling is an important contribution of this 
work. Besides giving a more direct proof of the second part of Chakrabarti et al.'s IC SWYOIJ argument, our 
approach quickly generalises to two party bounded error private coin multiple round protocols, and allows 
us to prove a message compression result and a direct sum lower bound for such protocols. Direct sum lower 
bounds for such protocols were not known earlier. In addition, our message compression result and direct 
sum lower bound for multiple round protocols hold for protocols computing relations too. 

The second part of our argument raises an interesting question in the setting of quantum communication. 
Can we always make the length of quantum messages comparable to the amount of information they carry 
about the inputs without significantly changing the error probability of the protocol? That is, for x S 
{0, 1}", instead of distributions Px we have density matrices px so that the expected quantum relative 

entropy Ex[5'(p2:||p)] < a, where p = Ex[Px\- Also, we are given measurements (POVM elements) My, 
X, y G {0, 1}" . Then, we wish to replace px by p'x so that there is a subspace of dimension n ■ 2*^^"/'^) 
that contains the support of each p^; also, there is a set ^ C {0, 1}", \A\ > | • 2" such that for each 
{x,y) G ^ X {0,1}", {TiMyPx — TxMyp'x\ < e. Fortunately, the quantum analogue of the Substate 
theorem has already been proved by Jain, Radhakrishnan and Sen |JRS02|. Unfortunately, it is the rejection 
sampling argument that does not generalise to the quantum setting. Indeed, we can prove the following 
strong negative result about compressibility of quantum information: For sufficiently large constant a, there 
exist Px, My, x,y £ {0, 1}" as above such that any subspace containing the supports of p'^ as above has 
dimension at least 2"/^. This strong negative result seems to suggest that new techniques may be required 
to tackle the direct sum problem for quantum communication. 

1.1 Previous results 

The direct sum problem for communication complexity has been extensively studied in the past (see Kushile- 
vitz and Nisan KN97|). Let / : {0, 1}" x {0, 1}" {0, 1} be a function. Let C{f) {R{f )) denote the 
deterministic (bounded error private coin randomised) two-party communication complexity of /. Ceder, 
Kushilevitz, Naor and Nisan | F KNN95.1 showed that there exists a partial function / with C(/) = © (log n) , 
whereas solving m copies takes only C{f^) = 0{m + logm • logn). They also showed a lower bound 
C{f^) > m(yC(/)/2 — log n — 0(1)) for total functions /. For the one-round deterministic model, they 
showed that C {f^) > m{C{f) — log n — 0(1)) even for partial functions. For the two-round deterministic 
model, Karchmer, Kushilevitz and Nisan PKKN92 1 showed that C{f^) > m{C{f) — O(logn)) for any 
relation /. Feder et al. LFKNN95J also showed that for the equality problem R{EQ^) = 0{m + log n). 
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1.2 Our results 

We now state the new results in this paper. 

Result 1 (Compression result, multiple-rounds) Suppose that 11 is a k-round private coin randomised 
protocol for f : X x y ^ Z. Let the average error ofYl under a probability distribution fi on the inputs 
X X y be 6. Let X, Y denote the random variables corresponding to Alice's and Bob's inputs respectively. 
Let T denote the complete transcript of messages sent by Alice and Bob. Suppose I{XY : T) < a. Let 
e > 0. Then, there is another deterministic protocol H' with the following properties: 

(a) The communication cost of II' is at most -j_ ^ jjifs; 

(b) The distributional error ofU' under jjL is at most 5 + 2e. 

Result 2 (Direct sum, multiple-rounds) Let m,k be positive integers, and e,6 > 0. Let f : X x y Z 



Result 3 (Quantum incompressibility) Let m, n, d be positive integers and k > 7. Let d > 160^, 1600 • 
• k2^ ln(20d2) < m and 3200 • ■ 2"^^ Ind < n. Let the underlying Hilbert space be C™. There exist n 
states pi and n orthogonal projections Mi, 1 < I < n, such that 

(a) yiTrMipi = 1. 

(b) P = ^ ■ ^/ Pi ~ m' ^' ^f^^^^ ^ '■^ identity operator on C™. 

(c) yis{pi\\p) = k. 

(d) For all d-dimensional subspaces W ofC^, for all ordered sets of density matrices {cr/};g[„] with 
support in W,\{1 : Tr Miai < 1/10} | > n/4. 

Remark: The above result intuitively says that the states pi on log m qubits cannot be compressed to less 
than log d qubits with respect to the measurements Mi. 

1.3 Organisation of the rest of the paper 

Section 12 defines several basic concepts which will be required for the proofs of the main results. In Sec- 
tion 1^1 we prove a version of the message compression result for bounded error private coin simultaneous 
message protocols and state the direct sum result for such protocols. Our version is slightly stronger than the 
one in ICSWYOll . The main ideas of this work (i.e. the use of the Substate theorem and rejection sampling) 
are already encountered in this section. In Section |4j we prove the compression result for A; -round bounded 
error private coin protocols, and state the direct sum result for such protocols. We prove the impossibility of 
quantum compression in Section|5] Finally, we conclude by mentioning some open problems in Section|6l 

2 Preliminaries 

2.1 Information theoretic background 

In this paper. In denotes the natural logarithm and log denotes logarithm to base 2. All random variables will 
have finite range. Let [k] = {1, . . . ,k}. Let P, Q : [k] —>■ M. The total variation distance (also known as 
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ii-distance) between P, Q is defined as ||P - Q\\^ = J2i(^[k] l-P(^) " Q(^\- We say P < Q iff P{i) < Q{i) 
for all i G [k]. Suppose X, Y, Z are random variables with some joint distribution. The Shannon entropy of 

X is defined as H{X) = — Pr[X = x] log Vt[X = x\. The mutual information of X and Y is defined 

as I[X : Y) = H{X) + H{Y) - H{XY). For z G range(Z), I{{X : Y) \ Z = z) denotes the mutual 
information of X and Y conditioned on the event Z = z i.e. the mutual information arising from the joint 

distribution of X, y conditioned on Z = z. Define I{{X : Y) \ Z) = EzI{{X : Y) \ Z = z). It is 
readily seen that I{{X ■.Y)\ Z) = H{XZ) + H{YZ) - H{XYZ) - H{Z). For a good introduction to 
information theory, see e.g. ICT91II . 

We now recall the definition of an important information theoretic quantity called relative entropy, also 
known as Kullback-Leibler divergence. 

Definition 1 (Relative entropy) Let P and Q be probability distributions on a set [k\. The relative entropy 
ofP and Q is given by S{P\\Q) = ^ P{i) log 

The following facts follow easily from the definitions. 
Fact 1 Let X, Y, Z, W be random variables with some joint distribution. Then, 

(a) I{X : YZ) = I{X : Y) + I{{X : Z) \ Y); 

(b) I{XY : Z\W)> I{XY : Z) - H{W). 

Fact 2 Let {X,M) be a pair of random variables with some joint distribution. Let P be the (marginal) 
probability distribution of M, and for each x G range(X), let P^ be the conditional distribution of M 
given X = X. Then I{X : M) = 'Ex[S{Px\\P)\ where the expectation is taken according to the marginal 
distribution of X. 

Thus, if I{X : M) is small, then we can conclude that S{Px\\P) is small on the average. 
Using Jensen's inequality, one can derive the following property of relative entropy. 

Fact 3 (Monotonicity) Let P and Q be probability distributions on the set [k] and £ C [k]. Let Dp = 
{P{£), 1 — P{£)) and Dq = {Q{£), 1 — Q{S)) be the two-point distributions determined by £. Then, 
S{Dp\\Dq) < S{P\\Q). 

Our main information theoretic tool in this paper is the following theorem (see IURS02II '). 

Fact 4 (Substate theorem) Suppose P and Q are probability distributions on [k] such that S{P\\Q) = a. 
Let r > 1. Then, 

(a) the set Good = {i G [k] : ^!^^a+i) < Q{i)} has probability at least I — ^ in P; 

(b) There is a distribution P on [k] such that P — P < ^ and aP < Q, where a = (^-7^) 2^'"^"+^). 

Proof: Let Bad = [k] — Good. Consider the two-point distributions Dp = (P(Good), 1 — P(Good)) and 
Dq = (Q(Good), 1 - Q(Good)). By FactE S{Dp\\Dq) < a, that is, 

P(Good)log^(5££^+P(Bad)l„g^<o. 

Q{Gooa) Q(Bad) 
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From our definition, P(Bad)/Q(Bad) > 2^('^+i). Now, P(Good) log > P(Good) log P(Good) > 

— 1 (because xlogx > (— loge)/e > — 1 for < x < 1). It follows that P(Bad) < ^, thus proving part 

(a). Let P{i) = P(i)/P(Good) for i G Good and P{i) = otherwise. Then, P satisfies the requirements 
for part (b). ■ 

2.2 Chernoff-Hoeffding bounds 

We will need the following standard Chernoff-Hoeffding bounds on tails of probability distributions of 
sequences of bounded, independent, identically distributed random variables. Below, the notation B{t,q) 
stands for the binomial distribution got by t independent coin tosses of a binary coin with success probability 
q for each toss. A randomised predicate S on [A;] is a function S : [k] ^ [0, 1]. For proofs of the following 
bounds, see e.g. LASOO , Corollary A.7, Theorem A.13]. 

Facts 

(a) Let P be a probability distribution on [k] and S a randomised predicate on [k]. Let p = E ['5(2;)]. 
Let Y = (Yi, . . . , Yr) be a sequence of r independent random variables, each with distribution P. 



Then, 



Pr[| E [S{Yi)] -p\>e]< 2exp(-2eV) 



Y 



(b) Let Rbe a random variable with binomial distribution B{t, q). Then, 

Vi[R < ^tq] < exp (^-^tg 

2.3 Communication complexity background 

In the two-party private coin randomised communication complexity model IYao79L two players Alice and 
Bob are required to collaborate to compute a function f : X x y ^ Z. Alice is given x G A" and Bob is 
given y £ y. Let Il{x, y) be the random variable denoting the entire transcript of the messages exchanged 
by Alice and Bob by following the protocol 11 on input x and y. We say 11 is a (5-error protocol if for all x 
and y, the answer determined by the players is correct with probability (taken over the coin tosses of Alice 
and Bob) at least 1 — 5. The communication cost of 11 is the maximum length of Ii{x, y) over all x and y, 
and over all random choices of Alice and Bob. The A: -round (5-error private coin randomised communication 
complexity of /, denoted R^{f), is the communication cost of the best private coin fc-round 6-enor protocol 
for /. When 6 is omitted, we mean that S = ^. 

We also consider private coin randomised simultaneous protocols in this paper. Rf™{f) denotes the 
(5-error private coin randomised simultaneous communication complexity of /. When 6 is omitted, we mean 
that S = l- 

Let /i be a probability distribution on X x y. A deterministic protocol 11 has distributional error 6 
if the probability of correctness of 11, averaged with respect to /x, is least 1 — 6. The A; -round 5-error 
distributional communication complexity of /, denoted C^g{f), is the communication cost of the best 
A; -round deterministic protocol for / with distributional error 6. fi is said to be a product distribution if 
there exist probability distributions on X and fiy on 3^ such that fi{x,y) = nx{x) ■ fJ^yiy) for all 
(x, y) £ X X y. The A; -round 5-error product distributional communication complexity of / is defined as 
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^[] S^f^ ~ ^^P^t ^fis(f)' where the supremum is taken over all product distributions fion X x y. When 6 
is omitted, we mean that S = ^. 

We now recall the definition of the important notion of information cost of a communication protocol 
from Bar-Yossef et al. I BJKS02 I. 

Definition 2 (Information cost) Let Yibe a private coin randomised protocol for a function f : X x y ^ 
Z. Let n(x, y) be the entire message transcript of the protocol on input (x, y). Let jshe a distribution on 
X X y, and let the input random variable {X, Y) have distribution /x. The information cost of IT under fx is 
defined to be I{XY : Il{X, Y)). The k-round 6-error information complexity off under the distribution fj,, 
denoted by IC^ ^(/), is the infimum information cost under fi of a k-round 6-error protocol for f. IC|™(/) 
denotes the infimum information cost under the uniform probability distribution on the inputs of a private 
coin simultaneous 6-error protocol for f. 

Remark: In Chakrabarti et al. f CSWYOP l. the information cost of a private coin 5-error simultaneous 
message protocol 11 is defined as follows: Let X (Y) denote the random variable corresponding to Alice's 
(Bob's) input, and let M (N) denote the random variable corresponding to Alice's (Bob's) message to the 
referee. The information cost of IT is defined as I(X:M) + I(Y:N). We note that our definition of information 
cost coincides with Chakrabarti et al.'s definition for simultaneous message protocols. 

Let be a probability distribution onX xy. The probability distribution /x'" on X"^ x 3^™ is defined as 

//""((xi, . . .,Xm), {yi,- ■ . ,ym)) = Kxi,yi) ■ ^i{x2,y2 ) - ■ ■ Kxm ,ym)- Suppose n is a product probability 
distribution on X x y. It can be easily seen (see e.g. IBJKS02II ) that for any positive integers m, k, and 
real 6 > 0, IC^m sif"^) ^ " ^C^s(f)- reason for requiring to be a product distribution is as 
follows. We define the notion of information cost for private coin protocols only. This is because the proof 
of our message compression theorem (Theorem|3ll, which makes use of information cost, works for private 
coin protocols only. If jj. is not a product distribution, the protocol for / which arises out of the protocol 
for /"^ in the proof of the above inequality fails to be a private coin protocol, even if the protocol for 
was private coin to start with. To get over this restriction on /z, Bar-Yossef et al. fBJKS021 introduced the 
notion of conditional information cost of a protocol. Suppose the distribution fi is expressed as a convex 
combination /i = Yld&K ^dl^d of product distributions ^u^, where K is some finite index set. Let k denote 
the probability distribution on K defined by the numbers k^. Define the random variable D to be distributed 
according to k. Conditioned on L>, /i is a product distribution onX xy. We will call a mixture of product 
distributions {^d}deK and say that k partitions fi. The probability distribution on K"^ is defined as 

. . . , dm) = K{di) ■ K{d2) ■ ■ ■ K{dm). Then partitions ^u™ in a natural way. The random variable 
D"^ has distribution k™. Conditioned on D"^, fi^ is a product distribution on X™- x y^. 

Definition 3 (Conditional information cost) Let Yibe a private coin randomised protocol for a function 
f : X X y ^ Z. Let n(x, y) be the entire message transcript of the protocol on input (x, y). Let fi 
be a distribution on X x y, and let the input random variable {X, Y) have distribution fi. Let ji be a 
mixture of product distributions partitioned by k. Let the random variable D be distributed according to 
K. The conditional information cost of 11 under (/z, k) is defined to be I{{XY : n(X, Y)) \ D). The k- 
round 6-error conditional information complexity of f under {fi, k), denoted by IC^ ^{f \ k), is the infimum 
conditional information cost under {p, k) of a k-round 6-error protocol for f. 

The following facts follow easily from the results in Bar-Yossef et al. IIBJKS02I and Fact^ 

Fact 6 Let fj, be a probability distribution on X x y. Let k partition jj,. For any f : X x y ^ Z, positive 
integers m, k, real 6 > 0, ICf^m^sif"' U™) > • IC^^gif \ k) > m ■ {IC^^sif) " -^('^))- 
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Fact 7 With the notation and assumptions ofFact^ Rs(f) — ^^/Isif I '^)- 
2.4 Sampling uniformly random orthonormal sets of vectors 

To prove our result about the incompressibility of quantum information, we need to define the notion of a 
uniformly random set of size d of orthonormal vectors from C™. Let U(m) denote the group (under matrix 
multiplication) of m x m complex unitary matrices. Being a compact topological group, it has a unique Haar 
probability measure on its Borel sets which is both left and right invariant under multiplication by unitary 
matrices (see e.g. [Chapter 14, Corollary 20] fRoySSI). Let \Jm,d-, {I < d < m) denote the topological space 
of mxd complex matrices with orthonormal columns. \Jm,d is compact, and the group U(m) acts on \Jm,d 
via multiplication from the left. Let fm,d '■ U(m) — > \Jm,d be the map got by discarding the last m — d 
columns of a unitary matrix. ^ induces a probability measure ^im,d on the Borel sets of \Jrn,d from the 
Haar probability measure on U(m). ^im,d is invariant under the action of U(m), and is in fact the unique 
U(m) -invariant probability measure on the Borel sets of Um,d (see e.g. [Chapter 14, Theorem 25] |Roy 8'8^). 
By a uniformly random ordered set {vi, . . . , Vd), 1 < d < m of orthonormal vectors from C"^, we mean 
an element of U^.d chosen according to By a uniformly random d dimensional subspace V of C", 

we mean a subspace V = Span(t'i, . . . , v^), where (vi, . . . , v^) is a uniformly random ordered set of 

orthonormal vectors from C". 

Let 0(m) denote the group (under matrix multiplication) of m x m real orthogonal matrices. Identify 
with M^"^ by treating a complex number as a pair of real numbers. A uniformly random unit vector in 
(i.e. a vector distributed according to /im,i) is the same as a uniformly random unit vector in M^™, since 

U(m) is contained in 0(2m). From now on, while considering metric and measure theoretic properties of 

Urra,i, it may help to keep the above identification of C™ and M^™ in mind. 

One way of generating a uniformly random unit vector in is as follows: First choose {yi, . . . , Um) 

independently, each yi being chosen according to the one dimensional Gaussian distribution with mean 

and variance 1 (i.e. a real valued random variable with probability density function ^^^^^ '' )■ Normalise to 

get the unit vector (xi, . . . , Xm), where Xi = , „ ^' (note that any y,; = with zero probability). It 

is easily seen that the resulting distribution on unit vectors is 0(m) -invariant, and hence, the above process 
generates a uniformly random unit vector in M™. 

From the above discussion, one can prove the following fact. 

Facts 

(a) Let 1 < d < m. Let (vi, . . . , Vd) be distributed according to fJ.rn,d- Then for each i, Vi is distributed 
according to fim,i, and for each j, (vj, fj) is distributed according to /im,2> 

(b) Suppose X, y are independent unit vectors, each distributed according to fim,i- Let w" = y — {x\y)x, 
and set w = X and w' = -p^y (note that w" = with probability zero). Then the pair {w,w') is 
distributed according to fim,2- 

(c) Suppose X, y are independent unit vectors, each distributed according to /im,i- Let V be a subspace 
of C"* and define x = ||^f||-> y = | | p^|| . where P is the orthogonal projection operator onto V ( note 
that Px = 0, Py = are each zero probability events). Then x,y are uniformly random independent 
unit vectors in V. 
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We will need to 'discretise' the set of d-dimensional subspaces of C'". The discretisation is done by 
using a 6-dense subset of Um,i- A subset M of Vm.i is said to be J-dense if each vector v G Um,i has 
some vector in M at distance no larger than 6 from it. We require the following fact about 5-dense subsets 

of Um,l. 

Fact 9 r iMat021 Lemma 13.1.1, Chapter 13]) For each < 5 < 1, there is a 6-dense subset J\f o/U„i,i 
satisfying \M\ < (4/5)2'". 

A mapping / between two metric spaces is said to be 1-Lipschitz if the distance between f{x) and 
f{y) is never larger than the distance between x and y. The following fact says that a 1-Lipschitz func- 
tion / : Um^i M greatly exceeds its expectation with very low probability. It follows by combining 
Theorem 14.3.2 and Proposition 14.3.3 of IMat02l Chapter 14]. 

Fact 10 Let f : Vm,i ^ be 1-Lipschitz. Then for a// < t < 1, Pr[/ > E[/] + t + 12/ V^] < 
2exp(— i^m). 

2.5 Quantum information theoretic background 

We consider a quantum system with Hilbert space C". For A, B Hermitian operators on C", A < B i?, n 
shorthand for the statement "B — A is positive semidefinite". A POVM element M over C"* is a Hermitian 
operator satisfying the property < A/ < /, where 0, / are the zero and identity operators respectively on 

C™. For a POVM element M over C"' and a subspace W of C™, define M{W) = max {w\M\w). 

W&W ■.\\'U}\\=1 

For subspaces W, W of C", define A(W, W') = maxjv/ |M(VF) — M{W')\, where the maximum is taken 
over all POVM elements M over C'". A{W, W) is a measure of how well one can distinguish between 
subspaces W, W' via a measurement. For a good introduction to quantum information theory, see INCOOI . 
The following fact can be proved from the results in tAKN98l . 

Fact 11 Let M be a POVM element over C™" and let w,w ^ be unit vectors. Then, \{w\M\w) — 
{w\M\w)\ < \\w — iv\\. 

A density matrix p over is a Hermitian, positive semidefinite operator on with unit trace. If A is 
a quantum system with Hilbert space C™ having density matrix p, then S{A) = S{p) = — Tr plog p is the 
von Neumann entropy of A. If A, B are two disjoint quantum systems, the mutual information of A and B 
is defined as I{A : B) = S{A) + S{B) — S{AB). For density matrices p, a over C", their relative entropy 

is defined as S{p\\a) = Tr pilog p — logcr). Let X be a classical random variable with finite range and M 
be a m-dimensional quantum encoding of X i.e. for every x G range(X) there is a density matrix ax over 

{(Tx represents a 'quantum encoding' of x). Let a = Ex cTx, where the expectation is taken over the 
(marginal) probability distribution of X. Then, I{X : M) = Ex S{(Tx\\cf). 

3 Simultaneous message protocols 

In this section, we prove a result of ICSWYOll . which states that if the mutual information between the 
message and the input is at most k, then the protocol can be modified so that the players send messages of 
length at most 0{k-\- log n) bits. Our proof will make use of the Substate Theorem and a rejection sampling 
argument. In the next section, we will show how to extend this argument to multiple-round protocols. 
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Before we formally state the result and its proof, let us outline the main idea. Fix a simultaneous 
message protocol for computing the function / : {0, 1}" x {0, 1}" Z. Let X G[/ {0, 1}". Suppose 
I{X : M) < a, where M be the message sent by Ahce to the referee when her input is X. Let Sxy{m) be 
conditional probability that the referee computes f{x, y) correctly when Alice's message is m, her input is 
X and Bob's input is y. 

We want to show that we can choose a small subset M. of possible messages, so that for most x, Alice 
can generate a message M^, from this subset (according to some distribution that depends on x), and still 
ensure that 'Ej[sxy{M'^)\ is close to 1, for all y. Let Px be the distribution of M conditioned on the event 
X = X. For a fixed x, it is possible to argue that we can confine Alice's messages to a certain small subset 
M.X C [A;]. Let M.x consist of 0{n) messages picked according to the distribution Px- Then, instead of 
sending messages according to the distribution Px, Alice can send a random message chosen from M.x- 
Using Chernoff-Hoeffding bounds one can easily verify that M.x will serve our purposes with exponentially 
high probability. 

However, what we really require is a set of samples {M.x} whose union is small, so that she and the 
referee can settle on a common succinct encoding for the messages. Why should such samples exist? Since 
I{X : M) is small, we have by Fact|2lthat for most x, the relative entropy S{Px\\Q) is bounded (here Q 
is the distribution of the message M, i.e., Q = Ex[-Px])- By combining this fact, the Substate Theorem 
(Fact|4li and a rejection sampling argument (see e.g. |Ros9T Chapter 4, Section 4.4]), one can show that if 
we choose a sample of messages according to the distribution Q, then, for most x, roughly one in every 2*^^") 
messages 'can serve' as a message sampled according to the distribution Px- Thus, if we pick a sample of 
size n • 2*^'^"^ according to Q, then for most x we can get a the required sub-sample J^x- of 0{n) elements. 
The formal arguments are presented below. 

The following easy lemma is the basis of the rejection sampling argument. 

Lemma 1 (Rejection sampling) Let P and Q be probability distributions on [k] such that 2~"'P < Q. 
Then, there exist correlated random variables X and x taking values in [k] x {0, 1}, such that: (a) X has 
distribution Q, (b) Pv[x = 1] = 2"" and (c) Pr[X = i \ x = I] = P{i). 

Proof: Since the distribution of X is required to be Q, we will just describe the conditional distribution of 
X for each potential value i for X: let Pr[x = \ \ X = i] = P{i) / {2°-Q{i)). Then, 

Pr[x = l] = Y,P[X = i\- Pr[x = l\X = i] = 2-^ 

ie[fc] 

and 

■ 

In order to combine this argument with the Substate Theorem to generate simultaneously a sample A4 
of messages according to the distribution Q and several subsamples M-x, we will need a slight extension of 
the above lemma. 

Lemma 2 Let P and Q be probability distributions on [k] such that 2~"'P < Q. Then, for each integer 
t > 1, there exist correlated random variables X = {Xi,X2, ■ ■ ■ , Xt) and Y = (Yi, I2, . . . , Ypt) such that 

(a) The random variables {Xi : i £ [t]) are independent and each Xi has distribution Q; 

(b) R is a random variable with binomial distribution B{t, 2~"'); 
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(c) Conditioned on the event R = r, the random variables (1^ : i € [r]) are independent and each Yi has 
distribution P. 



(d) Y is a subsequence ofK (with probability 1). 

Proof: We generate t independent copies of the random variables (X, x) promised by Lemma^ this gives 

us X = (Xi, X2, . . . , Xt) and x = {xi,X2, ■ ■ ■ , Xt)- Let Y = {Xi : Xi = 1)- It is easy to verify that X 
and Y satisfy conditions (a)-(d). ■ 
Our next lemma uses LemmaQto pick a sample of messages according to the average distributions Q and 
find sub-samples inside it for several distributions P^. This lemma will be crucial to show the compression 
result for simultaneous message protocols (Theorem^. 



Lemma 3 Let Q and Pi, P2, . . . , Pn be probability distributions on [k]. Define ai = S{Pi\\Q). Suppose 
Oj < 00 for all i G [N]. Let Sij, Sij, . . . , Sij be functions from [k] to [0, 1]. (In our application, they will 
correspond to conditional probability that the referee gives the correct answer when Alice sends a certain 

message from [k]). Letpij = Eyi^p^ [k] [^ijiv]]- Fix e G (0, 1]. Then, there exists a sequence x = (xi, . . . , Xt) 
of elements of [k] and subsequences y^, . . . , o/x such that 

A 



(a) y"^ is a subsequence of {xi, ... ,xti) where, ti 



8.2(''»+i)/''-log(2jV) 



(b) Fori, 3 = 1,2, 



.N, 



< 2e, where ri is the length o/y*. 



(c) t = maxj ti. 

Proof: Using part (b) of Fact|31 we obtain distributions Pi such that 



Vi G [k]. 



P-P 



< 2e and (1 - e)2~("'+i)/^P, < Q. 



Using Lemma 12 we can construct correlated random variables (X, Y^, Y^, . . . , Y^) such that X is a 

sequence of t = maxj ti independent random variables, each distributed according to Q, and (X[l, ti],Y^) 
satisfying conditions (a)-(d) (with P = Pi, a = (oj + l)/e — log(l — e) and t = ti). We will show that 
with non-zero probability these random variables satisfy conditions (a) and (b) of the present lemma. This 
implies that there is a choice (x, y^ , . . . , y^) for (X, Y^ , • • • , Y^) satisfying parts (a) and (b) of the present 
lemma. 

Let Ri denote the length of Y*. Using part (b) of Fact|5l Pr[3i, Ri < {4/e'^) log(2iV)] < N ■ ^ = ^. 



Now, condition on the event Ri > (4-) log(2A^), for all 1 < i < A^. Define pij = Pr [sij{y)]. We use 

" 2/e p. [k] 

I— I 

part (a) of FactQ]to conclude that 



Pr 



E \sij{Y' 



Pij 



> e 



< 



{2NY 



yi,j = i,...,N, 



(1) 



implying that 



Pr 



E 

tec/In 



Pij 



> e 



(2iV)8 ^ 2' 



(2) 
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From Q, ^ and the fact that \pij — pij\ < e (since Pi — Pi < 2e), it follows that part (6) of our 
lemma holds with non-zero probability. Part (a) is never violated. Part (c) is true by definition of t. ■ 

Theorem 1 (Compression result, simultaneous messages) Suppose that IT is a 5-error private coin si- 
multaneous message protocol for f : {0, 1}" x {0, 1}" Z. Let the inputs to f be chosen according to 
the uniform distribution. Let X, Y denote the random variables corresponding to Alice's and Bob's inputs 
respectively, and Ma, Mb denote the random variables corresponding to Alice's and Bob's messages re- 
spectively. Suppose I{X : Ma) < a and I{Y : Mb) < h. Then, there exist sets GoodA, Goods ^ {0, 1}" 
such that I Good A I > | • 2*^ and IGood^l > | • 2", and a private coin simultaneous message protocol H' 
with the following properties: 

(a) In n', Alice sends messages of length at most ^^±1 + log(n + 1) + log ^a^^^^^^ + 4 bits and Bob sends 
messages of length at most ^h±l -\- log(n + 1) + log + 4 bits. 

(b) For each input (x, y) G Good/i x Good b, the error probability of II' is at most 6 + 4e. 



Proof: Let P be the distribution of Ma, and let Px be its distribution under the condition X = x. Note that 
by Fact|2l we have Ex[5'(-Px ||i-*)] < a, where the expectation is got by choosing x uniformly from {0, 1}". 
Therefore there exists a set Good^, IGood^l > | • 2", such that for all x G GoodA, S{Px\\P) < 3a. 

Define ta = ■ Frorn Lemma we know that there is a sequence of messages a = 

{mi, . . . , mt^) and subsequences ax of a such that on input x G GoodA, if Alice sends a uniformly chosen 
random message of cj^ instead of sending messages according to distribution Px, the probability of error 
for any y G {0, 1}" changes by at most 2e. We now define an intermediate protocol 11" as follows. The 
messages in a are encoded using at most logta + 1 bits. In protocol 11" for x G GoodA, Alice sends a 
uniformly chosen random message from cj^; for x ^ Good^, Alice sends a fixed arbitrary message from a. 
Bob's strategy in U" is the same as in 11. In 11", the error probability of an input (x, y) G GoodA x {0, 1}" is 
at most 6 + 2e, and I{Y : Mb) < b. Now arguing similarly, the protocol IT" can be converted to a protocol 



A 8(n+l)2{3i'+i)/^ 

an input (x, y) G GoodA x Goods is at most 6 + 4e. 



n' by compressing Bob's message to at most logt^ + 1 bits, where tfy = e^(i-e) " error for 



Corollary 1 Let 6,e > 0. Let f : {0, 1}" x {0, 1}" Z be a function. Let the inputs to f be chosen 
according to the uniform distribution. Then there exist sets GoodA, Goods ^ {0, 1}" such that |Goodyi| > 
I • 2", IGoodsl > I • 2", andlCf^U) > l{Rf:^,,{f) - 21og(n + 1) - 2 log - f - 8), where f 

is the restriction of f to Good/i x Goods. 

We can now prove the key theorem of Chakrabarti et al. [CSWYOl |. 

Theorem 2 (Direct sum, simultaneous messages) Let 6,e > 0. Let f : {0, 1}" x {0, 1}" Z be a 

function. Define = minj/ where the minimum is taken over all functions f which are 

the restrictions of f to sets of the form A x B, A, B C {0,1}", |A| > | • 2", |5| > | • 2". Then, 
RriD > ^RfTAeU) - 21og(n + 1) - 21og - f - 8). 

Proof: Immediate from Fact0 Fact|6land Corollary^ ■ 
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Remarks: 

1. The above theorem imphes lower bounds for the simultaneous direct sum complexity of equality, as 
well as lower bounds for some related problems as in Chakrabarti et al. LCSWYOIJ . The dependence of the 
bounds on e is better in our version. 

2. A very similar direct sum theorem can be proved about two-party one-round private coin protocols. 

3. All the results in this section, including the above remark, hold even when / is a relation. 



4 Two-party multiple-round protocols 

We first prove Lemma |4j which intuitively shows that if P, Q are probability distributions on [k] such that 
P < then about it is enough to sample Q independently 2^'^"''> times to produce one sample element 
Y according to P. In the statement of the lemma, the random variable X represents an infinite sequence of 
independent sample elements chosen according to Q, the random variable R indicates how many of these 
elements have to be considered till 'stopping'. R = oo indicates that we do not 'stop'. If we do 'stop', then 
either we succeed in producing a sample according P (in this case, the sample Y = Xr), or we give up (in 
this case, we set Y = 0). In the proof of the lemma, ★ indicates that we do not 'stop' at the current iteration 
and hence the rejection sampling process must go further. 

Lemma 4 Let P and Q be probability distributions on [k], such that Good = {i G [k] : < Q{i)} has 

probability exactly 1 — ein P. Then, there exist correlated random variables X = R and Y such 

that 

(a) the random variables (Xi : i £ N-|-) are independent and each has distribution Q; 

(b) R takes values in N+ U {oo} and E[R] = 2"; 

(c) ifR / oo, then Y = XRorY = 0; 

( P{i) ifi G Good 

(d) Y takes values in {0} U [k], such that: Pr[y = i\ = \ if i £ [k\ — Good 

{ e ifi = 0. 



Proof: First, we define a pair of correlated random variables (X, Z), where X takes values in [k] and Z 
in [k] U {0,*}. Let P' : [k] [0,1] be defined by P'{i) = P{i) for i G Good, and P'{i) = for 

i G [A:] - Good. Let (3 = €2-" /{I - (1 - e)2~'^) and 7, = P' {i)2-'' /Q{i). The joint probability distribution 
of X and Z is given by 



Vi G [k], Ft[X = i]= Q{i) and Pr[Z = j \ X = i] = < 



7i ifj = i 

- 7,) if j = 

1 - 7i - - li) ifj = * 

otherwise. 



Note that this implies that 

Pr[Z ^A = Y. ■ + /^(l - ^^)] = /3 + (1 - /?) E = /? + (1 - - e)2-'^ = 2"^ 

ie[k] i<^[k] 
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Now, consider the sequence of random variables X = (Xj)^^^^ and Z = (Zj)^^^^, where each (Xj, Zi) 
has the same distribution as {X, Z) defined above and (Xj, Zi) is independent of all (Xj, Zj),j ^ i. Let 

R = min{z : Zj / i? = oo if {i : Zj 7^ ★} is the empty set. i? is a geometric random variable with 

success probability 2~", and so satisfies part (b) of the present lemma. Let Y = Z^ if i? / 00 and y = 
if i? = 00. Parts (a) and (c) are satisfied by construction. 

We now verify that part (d) is satisfied. Since Pr[i? = cxd] = 0, we see that 



Pr[y = i\ = ^ Pr[ii = r] • Vv\Zr = i\ R = r\ 

reN+ 

= ^ Pr[i? = r] • Pr[Z^ = i | Z^ / ★] 



reN+ 

where the second equahty follows from the independence of (X^, Z^) from all (Xj, Zj), j / r. If i G [A;], 
we see that 



VY\Xr = i] ■ Pr[Zr = i\Xr=i] 
Pr[Z, / *] 



= Ft[R = r]P'{i) = P'{i). 

reN+ 

Thus, for i G Good, Pr[y = i] = P{i), and for i G [/c] — Good, Pr[y = i] = 0. Finally, 

Pr[y = 0] = y Pr[i? = r] • ^'j^'^' = 
^ ^ ^ J Pr Z,. 7^ -k] 

reN+ L r ^ J 



E ^^^^ E = j1 • = 1 X, = ,] 

•eN+ je[fc] 



2 

reN+ je[fc] 

= E = = ^• 

reN+ 

■ 

Lemma|5lfollows from Lemma|3 and will be used to prove the message compression result for two-party 
multiple-round protocols (Theorem |3ll. 

Lemma 5 Let Q and Pi, . . . , Pj^ be probability distributions on [k]. Define S{Pi\\Q) = ai. Suppose 
ai < 00 for all i G [N]. Fix e G (0, 1]. Then, there exist random variables X = (Xj)jgN^, Ri, . . . , Rj^ and 
Yi, . . . , Yn such that 
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(a) {Xi : i G N+) are independent random variables, each having distribution Q; 

(b) R, takes values in N+ U {00} and E[-Ri] = 2('''+^)/^• 

(c) Yj takes values in [k] U {0}, and there is a set Goodj C [A;] with Pj(Goodj) > 1 — e such that for 
all e G Goodj, Pr[Vj = £] = Pj{£),for all i e [k] - Goodj, Fv[Yj = £] = and FT[Yj = 0] = 
1 - Pj {Good j) < e; 

(d) if Rj < 00, then Yj = Xr^ orY = Q. 

Proof: Using part (a) of Fact@] we obtain for j = 1, . . . , A^, a set Goodj C [k] such that Pj(Goodj) > 1 — e 
and Pj(i)2"("j+^)/'^ < Q{i) for all i G Goodj. Now from Lemma|4] we can construct correlated random 
variables X, Yi, . . . , Y^, and . . . , Rn satisfying the requirements of the present lemma. ■ 

Theorem 3 (Compression result, multiple rounds) Suppose H is a k-round private coin randomised pro- 
tocol for f : X X y ^ Z. Let the average error ofH under a probability distribution /i on the inputs X xy 
be 6. Let X, Y denote the random variables corresponding to Alice's and Bob's inputs respectively. Let T 
denote the complete transcript of messages sent by Alice and Bob. Suppose I(XY : T) < a. Let e > 0. 
Then, there is another deterministic protocol IT' with the following properties: 

(a) The communication cost of II' is at most ^^("+^) _|_ ^ ijit^ - 

(b) The distributional error of II' under fj, is at most 6 + 2e. 

Proof: The proof proceeds by defining a series of intermediate A: -round protocols H'^, n'^,_^, . . . ,11'^. is 
obtained from Il'^_^_i by compressing the message of the ith round. Thus, we first compress the kth message, 
then the {k — l)th message, and so on. Each message compression step introduces an additional additive 
error of at most e/k for every input {x, y). Protocol II'. uses private coins for the first i — 1 rounds, and 
public coins for rounds i to k. In fact, XT'- behaves the same as 11 for the first i — 1 rounds. Let Ilf^_^_i denote 
the original protocol 11. 

We now describe the construction of IT', from Il'i_^_i. Suppose the ith message in H^^^ is sent by Alice. 
Let M denote the random variable corresponding to the first i messages in H^'i^i. M can be expressed as 
(Ml, M2), where M2 represents the random variable corresponding to the ith message and Mi represents 
the random variable corresponding to the initial i — 1 messages. From Fact Q (note that the distributions 
below are as in protocol H'^^^ with the input distributed according to ^), 

I{XY : M) = I{XY : Ml) + E [Ii{XY : M2) | Mi = mi)] = I{XY : Mi) + E [SiM^^""^ \\M^')] 

Ml MiXY 

where Mg denotes the distribution of M2 when {X, Y) = {x, y) and Mi = mi, and M^^ denotes the 
distribution of M2 when Mi = mi. Note that the distribution of M2^™^ is independent of y, as Il'j^i is 

private coin up to the ith round. Define Oj = I^MiXy[S{M2^^^ 11-^™^ )]■ 

Protocol behaves the same as H^j^i for the first i — 1 rounds; hence 11^ behaves the same as 11 for the 
first i — 1 rounds. In particular, it is private coin for the first i — 1 rounds. Alice generates the ith message 
of using a fresh public coin Cj as follows: For each distribution M™\ mi ranging over all possible 

initial i — 1 messages, Cj stores an infinite sequence = (7™^)jeN+> where (7™^ : j G N+) are chosen 
independently from distribution M™^ . Note that the distribution M™^ is known to both Alice and Bob as 
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mi is known to both of them; so both Ahce and Bob know which part of Cj to 'look' at in order to read from 
the infinite sequence T^'^. Using Lemma |5l Alice generates the ith message of II' which is either for 
some j, or the dummy message 0. The probability of generating is less than or equal to p If Alice does 
not generate 0, her message lies in a set Gooda;m^ which has probability at least 1 — ^ in the distribution 
j^ij/mi rpj^g probability of a message m2 G Gooda^m^ being generated is exactly the same as the probability 
of 7712 in Mg^^V The expected value of j is 2^^^^^^ ^W^'h Actually, Alice just sends the value 

of j or the dummy message to Bob, using a prefix free encoding, as the ith message of 11^. After Alice 
sends off the ith message, !['■ behaves the same as n'j_^i for rounds i + 1 to /c. In particular, the coin Cj is 
not 'used' for rounds i + 1 to A;; instead, the public coins of II^^^i are 'used' henceforth. 

By the concavity of the logarithm function, the expected length of the ith message of H' is at most 
2ke^^{S{M2^^^ 11-^^™^ ) + 1) + 2 bits for each {x, y, mi) (The multiplicative and additive factors of 2 are 
there to take care of the prefix-free encoding). Also in 11^, for each (x, y, mi), the expected length (averaged 
over the public coins of H^, which in particular include Cj and the public coins of ^i^i) of the {i + l)th 
to kth messages does not increase as compared to the expected length (averaged over the public coins of 
n'j^^) of the {i + l)th to kth messages in I^i_^_i- This is because in the ith round of 11^, the probability of 
any non-dummy message does not increase as compared to that in 'H'i^i, and if the dummy message is 
sent in the ith round 11^ aborts immediately. For the same reason, the increase in the error from n^^_^ to 11^ 
is at most an additive term of | for each {x, y, mi). Thus the expected length, averaged over the inputs and 
public and private coin tosses, of the ith message in XT'- is at most 2fce^^(aj + 1) + 2 bits. Also, the average 
error of 11^ under input distribution increases by at most an additive term of |. 

By FactQ Yl\=i^i — ■ T) < a, where I{XY : T) is the mutual information in the original 

protocol n. This is because the quantity Ea/iXY [5'(M2^™^ ||M™^)] is the same irrespective of whether it is 
calculated for protocol 11 or protocol ^[^i, as Il-'i_^_i behaves the same as IT for the first i rounds. Doing the 
above 'compression' procedure k times gives us a public coin protocol 11'^ such that the expected communi- 
cation cost (averaged over the inputs as well as all the public coins of 11']^) ofIl[ is at most 2ke~^{a+l) + 2k, 
and the average error of Il[ under input distribution fj, is at most 6 + e. By restricting the maximum com- 
munication to 2fce~^(a + 1) + 2ker^ bits and applying Markov's inequality, we get a public coin protocol 
n" from n'^ which has average error under input distribution /x at most 6 + 2e. By setting the public coin 
tosses to a suitable value, we get a deterministic protocol 11' from 11" where the maximum communication 
is at most 2ke~^{a + 1) + 2ke^^ bits, and the distributional error under /x is at most 6 + 2e. ■ 

Corollary 2 Let f : X x y ^ Z be a function. Let jshe a product distribution on the inputs X x y. Let 
6,e>0. Then, IC^.U) > £ • Cf^^s+2eif) " 2- 

Theorem 4 (Direct sum, /c-round) Let m,k be positive integers, and e,6 > 0. Let f : X x y Z be 

a function. Then, R^{f"^) > m • sup^,. • C'^5+2e(/) ~ 2 — H{k)^ , where the supremum is over all 
probability distributions /j, on X x y and partitions k of ^. 

Proof: Immediate from Fact0 Factl^Jand CorollaryEJ ■ 

Corollary 3 Let m,k be positive integers, and e,6 > 0. Let f : X x y ^ Z be a function. Then, 
RKD > . . Cf]^,^2.(/) - 2) • 

Remarks: 

1 . Note that all the results in this section hold even when / is a relation. 
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2. The above corollary implies that the direct sum property holds for constant round protocols for the pointer 
jumping problem with the 'wrong' player starting (the bit version, the full pointer version and the tree 
version), since the product distributional complexity (in fact, for the uniform distribution) of pointer jumping 
is the same as its randomised complexity LNW93..PRV01.I . 



5 Impossibility of quantum compression 

In this section, we show that the information cost based message compression approach does not work in 
the quantum setting. We first need some preliminary definitions and lemmas. 

Lemma 6 Fix positive integers d, m and real e > 0. Then there is a set S of at most d-dimensional 
subspaces ofC^ such that 

(a) \S\ < (^^Y""^. 

(b) For all d-dimensional subspaces W o/C™, there is an at most d-dimensional subspace W G S such 
thatA{W,W) < e. 

Proof: Let M be a (5-dense subset of \Jm,i satisfying Fact|9l For a unit vector v € C™, let v denote the 
vector in M closest to it (ties are broken arbitrarily). Let be a subspace of C" of dimension d. Let w = 
Yli=i o^i'^i be a unit vector in W, where {wi, . . . , Wd} is an orthonormal basis for W and Yli=i I'^jP — ^■ 
Define w' = Yli=i '^i'^i and w = if w' ^ ^, w = ^ if w' = 0. It is now easy to verify the following. 

(a) \\w — w'W < 5\fd. 

(b) > l-5^/d. 

(c) \\w — w\\ < 26Vd. 

Choose 6 = Define W to be the subspace spanned by the set {wl, . . . , Wd}- d\m{W) < d. By 

Fact^Jand (c) above, A{W, W) < e. Define S = {W : W subspace of C"^ of dimension d}. S satisfies 
part (b) of the present lemma. Also \S\ < (4/5)^"^'^ = (8\/d/e)^"^'^, thus proving part (a) of the present 
lemma. ■ 
We next prove the following two propositions using FactfTOl 



Proposition 1 Let m, d, I be positive integers such that d < and I < Let V be a fixed subspace of 
of dimension m/l. Let P be the orthogonal projection operator on V. Let (w, w') be an independently 
chosen random pair of unit vectors from C™. Then, 

(a) Pr[|(t^|t^')l>5^] <2exp(-^), 



(b) Pr 



\Px\\ > \ 



< 2exp (— ^) ,x = w,w', 



(c) Pr[|HPK)|>^]<6exp(-^). 
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Proof: To prove the first inequality, we can assume by the U(m)-invariance of that w' = ei. The map 
w I— > |(t(;|ei)| is 1-Lipschitz, with expectation at most by U(m)-symmetry and using convexity of the 
square function. By FactflOl 



Pr 



1 

M2 



< Pr 



{w\w')\ > l/y/m + 12/\/2m + 



1 



10^2 



< 2exp 



m \ 
lOOd^ ) 



proving part (a) of the present proposition. 

The argument for the second inequality is similar. By U(m)-symmetry and using convexity of the 
square function, E[||-Pi(;||] = E[||Pw'||] < ^. Since the map w ^ \\Pw\\ is 1-Lipschitz, by FactfTOlwe get 
that 



Pr 



\Px\\ > — 



< Pr 



\Px\\ > ^ + ^= + 1= 



< 2exp (-— j ,x = , 



proving part (b) of the present proposition. 

We now prove part (c) of the present proposition. Let w = 



and w 



TIT (note that \\Pw\\ = 



and ll-Pw'll = are each zero probability events). By Fact[Sl w, w' are random independently chosen unit 
vectors in V . By the argument used in the proof of part (a) of the present proposition, we get that 



Pr 



Now, 



Pr 



\{Pw\Pw')\ > 



\{w\w')\ > 



< 2exp 



( m \ 



m \ 



lOOdH 



j + 4 exp 



m 
41 



( m \ 



proving part (c) of the present proposition. 



Proposition 2 Let m, d, I be positive integers such that d < and I < Let V be a fixed subspace of 
of dimension m/l. Let P be the orthogonal projection operator on V. Let (w, w') be a random pair of 
orthonormal vectors from C"^. Then, 



Pr 



{w\P\w')\ > 



dH 



/ m \ 



Proof: By Fact[8l to generate a random pair of orthonormal vectors {w, w') from C™ we can do as follows: 
First generate unit vectors x, y G C™ randomly and independently, let w" = y — {x\y)x, and set = x and 
w' = 11^. Now (note that Fi:[w" = 0] = 0), 



\{w\P\w' 
By Proposition n we see that, 

Pr \{w\P\w')\ > 



\{w\P\w")\ ^ \{x\P\y ) + \{x\y)\{x\P\x) 



w 



< Pr 

< 6 exp 

< 10 exp 



\{w\P\w')\ > 



i-K^|y)l 

4/(5^2/) + (1/(5^2)) . (4//) 



l-(l/(5d2)) 



m \ 



WOdH 
m 



j + 2 exp 



m 



+ 2exp(--) 



loodn 



proving the present proposition. 



18 



Lemma 7 Let m, d, I be positive integers such that 200(i'*/ ln(20(i^) < m. Let V be a fixed subspace ofC"^ 
of dimension m/l. Let P be the orthogonal projection operator on V. Let W be a random subspace ofC"^ 
of dimension d. Then, 

Prpw G W, \\w\\ = 1 and \{w\P\w)\ > 6/1] < exp ("^^^j • 

Proof: Let {wi, . . . ,Wd) be a randomly chosen ordered orthonormal set of size d in C", and let W = 
Span(i(;i, . . . ,Wd)- By Fact[8l each Wi is a random unit vector of and each {wi,Wj), i / j is a 
random pair of orthonormal vectors of C™. By Propositions ^ and |2j we have with probability at least 

l-2dexp (-m) -10^2 exp (-^), 



4 2 

Vi, {wi\P\wi) < - and 7^ j,\{wi\P\wj)\ < 



We show that whenever this happens |(it;|P|^i;)| < 6/1 for all w G W, \\w\\ = 1. Let w = Yli=i '^i^i, 
where Yli=i l^^iP = 1- Then, 



\{w\P\w)\ 



a*aj{wi\P[ 



< '^\ai\'^\{wi\P\wi)\ + ^ |a*aj||(t(;i|P|i(;j)| 

6 
1' 

Thus, 

Prp'u; G VF, = 1 and \{w\P\w)\ > 6/1] < 2dexp (--^j + lOd^ exp 

/ m \ 



4/y ' "V 100^4/ . 

771 \ 



completing the proof of the present lemma. ■ 
We can now prove the following 'incompressibility' theorem about (mixed) state compression in the 
quantum setting. 

Theorem 5 (Quantum incompressibility) Let m,d,n be positive integers and k a positive real number 
such that k > 7, d> 160^, 1600(i''A:2*= ln{20d'^) < m and 32002^'=^^ Ind < n. Let the underlying Hilbert 
space be C™. There exist n states pi and n orthogonal projections Mi, 1 < I < n such that 

(a) yiTrMipi = 1. 

(b) P = ^ ■ Pi ~ m' ^' ^^^^^ ^ '■^ identity operator on C™. 

(c) yis{pi\\p) = k. 

(d) For all subspaces W of dimension d, \{Mi : Mi{W) < 1/10} | > n/4. 
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Proof: In the proof, we will index the n states pi, 1 < I < n as pij, 1 < i < ^,1 < j < 2'^. We 

wiU also index the n orthogonal projections M/ as Mij. For 1 < i < choose = . . . , 

to be a random ordered orthonormal basis of C™. i3* is chosen independently of i' ^ i. Partition the 



sequence into 2^ equal parts; call these parts B^\ 1 < J < 2'^. Define pij 



A 2*= 



E^eB'j Define 



Mjj = Ylyfzigi] Define Vij = Span(i; : v £ B^^). Vij is the support of pij. It is easy to see that 

Pij, Mij satisfy parts (a), (b) and (c) of the present theorem. 

To prove part (d), we reason as follows. Let W he. & fixed subspace of C™ of dimension d. Let Pij 
denote the orthogonal projection operator onto Vij. By the U(m)-invariance of the distribution pm.d and 
from Lemma for each 



Pr 



6 

3w G W, \\w\\ = 1 and > ^ 



< exp 



m 



200-2''SJ ' 



where the probability is over the random choice of the bases 1 < i < Define the set 

Bad = {ie [n/2>^] : 3j G [2%M,,{W) > 



Hence for a fixed i G [^] > 

Pr[« G Bad] < 2'' exp 
Since the events i G Bad are independent, 

3 n 



m 



200 • 2fcd4 



< exp 



m 



400 • 2>'d^J ■ 



Pr 



So, 



Bad! > 



Pr 



4 2^= 



< 



4^ 



^ / 4g \ 2'=+2 / 3mn 



M,j : M,j{W) > 



2k 



> 



3n 



< 



4e \ 2^ 



exp 



3mn 



1600 • 22fcd4 



By setting e = 1/20 in Lemma|6l we get 



Pr 



3W G 5, 



> 



< 
< 1 



exp 



3n 
T 

3mn 



1600 • 22fcd4 



for the given constraints on the parameters. Again by Lemma|6l we get 



Pr 



3W subspace of C", dim(VF) = d, 



Mij : Mij{W) > 



3W G S, 



= Pr 
< 1. 

This completes the proof of part (d) of the present theorem. 



M,, : M,,{W) > ^ 



3n 
> — 
- 4 



10 



> 



3n 
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6 Conclusion and open problems 



In this paper, we have shown a compression theorem and a direct sum theorem for two party multiple round 
private coin protocols. Our proofs use the notion of information cost of a protocol. The main technical in- 
gredient in our compression proof is a connection between relative entropy and sampling. It is an interesting 
open problem to strengthen this connection, so as to obtain better lower bounds for the direct sum problem 
for multiple round protocols. In particular, can one improve the dependence on the number of rounds in the 
compression result (by information cost based methods or otherwise)? 

We have also shown a strong negative result about the compressibility of quantum information. Our 
result seems to suggest that to tackle the direct sum problem in quantum communication, techniques other 
than information cost based message compression may be necessary. Buhrman et al. I BCWdWOl I have 
shown that the bounded error simultaneous quantum complexity of EQ„ is 6'(log n), as opposed to 9{\^) 
in the classical setting | NS96 B K97 1 . An interesting open problem is whether the direct sum property holds 
for simultaneous quantum protocols for equality. 
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